State Indexed Policy Search by Dynamic Programming
نویسندگان
چکیده
We consider the reinforcement learning problem of simultaneous trajectory-following and obstacle avoidance by a radio-controlled car. A space-indexed non-stationary controller policy class is chosen that is linear in the features set, where the multiplier of each feature in each controller is learned using the policy search by dynamic programming algorithm. Under the control of the learned policies the radio-controlled car is shown to be capable of reliably following a predefined trajectory while avoiding point obstacles along its way.
منابع مشابه
An Optimal Tax Relief Policy with Aligning Markov Chain and Dynamic Programming Approach
Abstract In this paper, Markov chain and dynamic programming were used to represent a suitable pattern for tax relief and tax evasion decrease based on tax earnings in Iran from 2005 to 2009. Results, by applying this model, showed that tax evasion were 6714 billion Rials**. With 4% relief to tax payers and by calculating present value of the received tax, it was reduced to 3108 billion Rials. ...
متن کاملTuning Approximate Dynamic Programming Policies for Ambulance Redeployment via Direct Search
In this paper we consider approximate dynamic programming methods for ambulance redeployment. We first demonstrate through simple examples how typical value function fitting techniques, such as approximate policy iteration and linear programming, may not be able to locate a high-quality policy even when the value function approximation architecture is rich enough to provide the optimal policy. ...
متن کاملOptimizing Energy Production Using Policy Search and Predictive State Representations
We consider the challenging practical problem of optimizing the power production of a complex of hydroelectric power plants, which involves control over three continuous action variables, uncertainty in the amount of water inflows and a variety of constraints that need to be satisfied. We propose a policy-search-based approach coupled with predictive modelling to address this problem. This appr...
متن کاملA Multi-Stage Single-Machine Replacement Strategy Using Stochastic Dynamic Programming
In this paper, the single machine replacement problem is being modeled into the frameworks of stochastic dynamic programming and control threshold policy, where some properties of the optimal values of the control thresholds are derived. Using these properties and by minimizing a cost function, the optimal values of two control thresholds for the time between productions of two successive nonco...
متن کاملProbabilistic Differential Dynamic Programming
We present a data-driven, probabilistic trajectory optimization framework for systems with unknown dynamics, called Probabilistic Differential Dynamic Programming (PDDP). PDDP takes into account uncertainty explicitly for dynamics models using Gaussian processes (GPs). Based on the second-order local approximation of the value function, PDDP performs Dynamic Programming around a nominal traject...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007